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I would like to congratulate Professor Rao on hav- 
ing produced an overview of survey methodology 
which is at the same time a broad-ranging prospec- 
tus of current research and also an impressive retro- 
spective from a modern viewpoint of the early his- 
torical developments. He shows us in broad terms 
where the various approaches to survey methodol- 
ogy have been successful and where they cannot 
quite be relied upon without further development. 

Most of the paper is not specifically directed at 
contrasting the Bayesian and frequentist viewpoints. 
The most important distinctions for Rao seem to be 
between model-dependent and design-based meth- 
ods, and Bayes methods are faulted in Rao's cho- 
sen terrain of "the large-scale production of official 
statistics from complex surveys" primarily for us- 
ing models where models are not absolutely nec- 
essary. He takes for granted that models will be 
used in adjusting for nonresponse, in his formula- 
tion largely through calibration, and in small area 
estimation. The faults he finds with unnecessarily 
model-dependent survey estimation methods are: 

• design-inconsistency (of model-based BLUP un- 
der misspecified models, and in other examples, 
in Section 3.2); 

• requiring different sets of predictor variables for 
different attributes of interest (in Section 3.3); 

and in Section 4.2, in relation to the nonparamet- 
ric Bayesian and pseudo-Bayesian methods relying 
heavily on exchangeability, for their 
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• lack of generalizability to complex survey designs 
with clustering and unequal probability weighting. 

Like many authors in survey sampling, Rao faults 
model-based analyses because of possible model mis- 
specification. This discussion highlights aspects and 
consequences of model misspecification under the 
headings of Rao's paper. 

1. MODEL MISSPECIFICATION IN LINEAR 
REGRESSION AND CALIBRATION 

In Section 3.1 of his paper, Rao considers the be- 
havior of a calibration estimator (of a population 
total) when the calibration constraints involve some 
but not all of the predictor variables entering a true 
superpopulation model. The context is a superpop- 
ulation in which the regression model 



(1) 



Y i = [3'X i + iZ i + £ i 



holds for all units i in the frame U, with auxiliary 
variables Xi,Zi known for all population units, and 
where it is desired to estimate the total ty = Ylieu ^ 
based on a probability sample of units i£5 with 
first-order inclusion weights di = l/^i- [In Rao's ex- 
ample, the weights di are all equal, Xi = (1, a^)', and 
Z{ = x?, for a scalar auxiliary variable scj.] A cali- 
bration estimator of ty might be based on the varia- 
bles Xi alone, that is, on Yli£S w ^ wnere t ne modi- 
fied weights Wi are determined by minimizing 
Y^i£s( Wi ~ di) 2 /di subject to the constraints 
Yli^s w iXi = ^2i£u As described by Rao, it turns 
out that this calibration estimator is equivalent to 
the generalized regression (GREG) estimator based 
on the weights and the predictor variable Xi. In 
the setting with constant di, this estimator would 
be the unweighted model-based regression estima- 
tor based on predictor X^. 

As Rao suggests, calibration might be based on 
a subset of the appropriate predictor variables when 
the same universal calibration constraints are used 
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over many different choices of response variables. In 
the context (1) above, there are three ways in which 
this calibration estimator based on variables X, 
might be inadequate. First, the weights d{ used in 
the estimator might not be the correct ones: for ex- 
ample, when the unweighted regression estimator is 
used but the design weights are not constant, this 
is a familiar kind of wrong-model inconsistency that 
arises in Section 3.2. Second, the calibration totals 
Siew fixed in defining the estimator might not be 
correct: this may be viewed as a failure of the frame- 
coverage model. [A superpopulation-based treatment 
of linear calibration with inaccurate totals is given 
in Slud and Thibaudeau (2010), Proposition 1, in 
a more general setting also involving nonresponse 
adjustment and weight-compression.] Third, as men- 
tioned in Rao's paper with reference to Rao, Joce- 
lyn and Hidiroglou (2003), the coverage of the con- 
fidence intervals for ty based on this calibration es- 
timator might not be close to nominal in moderate 
samples. The first two of these three cases represent 
actual design inconsistency. However, if the weights 
and calibration totals are correct, then the calibra- 
tion estimator based on Xi is still a model-assisted 
GREG estimator and therefore design-consistent un- 
der general conditions, but the problematic coverage 
of its confidence intervals seems to be due to slow 
convergence to the limiting normal asymptotic dis- 
tribution, which Rao, Jocelyn and Hidiroglou (2003) 
found to be related to skewness of the residuals from 
the incorrect linear regression model of Yi on Xi 
when (1) holds with nonzero 7. This failure of mode- 
rate-sample coverage of confidence intervals due to 
slow distributional convergence is more subtle than 
design-inconsistency, but may still be important in 
practice in surveys where regressions are done sepa- 
rately in each stratum, since the whole sample might 
be large while the individual strata might all have 
moderate sample size. 

2. DIAGNOSTICS IN SMALL AREA 
ESTIMATION 

One survey-sampling task where all practitioners 
would agree on the necessity of explicit models is 
Small Area Estimation. When survey estimates are 
required for small domains where little or no sam- 
ple is available, models perform a function of driving 
direct estimates toward covariate-defined predictors, 
providing extrapolated estimates in domains where 
there is no sample and shrinking direct estimates 



for covariate-defined similar domains together. The 
most convenient small area estimation models, whe- 
ther hierarchical Bayes or generalized-linear with 
aggregate-level random effects, have the same form 
for all domains in the frame population. For any 
specific proposed model, this is an assumption that 
requires checking and may prove crucial to the qual- 
ity of small area estimates or predictions. Yet there 
is remarkably little work on goodness-of-fit check- 
ing in small area models, and hardly any mention of 
the topic in the present paper, due in part to Rao's 
focus in Section 5 on Hierarchical Bayes methods. 

Goodness-of-fit and model-checking methods have 
been studied in the survey literature, with impor- 
tant contributions by Rao himself. Chi-squared tests 
based on survey cross-classifications were studied 
in a series of papers leading up to Rao and Scott 
(1984), and are widely cited but perhaps not much 
used in model-checking. A different chi-squared test, 
based on estimated cell-frequencies in multi-way ta- 
bles and suited to small area models, was given by 
Jiang, Lahiri and Wu (2001), work which was ex- 
tended to tests for mixed linear model diagnostics 
studied in Jiang (2001), again in a form which could 
be used in assessing the fit of a small area model. 
In a different direction, the paper of Eltinge and 
Yansaneh (1997) is unusual in providing diagnostics 
for nonresponse adjustment cells in surveys. Apart 
from these papers, diagnostics are often borrowed 
from parametric nonsurvey statistics in individual 
survey applications. 

The Census Bureau's Small Area Income and Po- 
ver-ty Estimates (SAIPE) program, mentioned by 
Rao as a source of examples for small area method- 
ology, has provided an extensive test-bed for small 
area model-checking techniques (Citro and Kalton, 
2000). As described in Rao [(2003), Chapter 7] and 
Citro and Kalton (2000), the county-level log-count 
model for poor children had the Fay-Herriot form 

y i = x' i f3 + u i + e i , 

(2) 

m ~A/"(0,cj 2 ), ei ~M(0,v e /ni), 

where yi is the direct-estimated log-count of poor 
children in county i, vector of covariate pre- 

dictors, rii is the number of sampled households, Ui 
is the county-level random effect, and ei are ran- 
dom survey errors with variances assumed known. 
Because roughly 20% of sampled counties, with posi- 
tive rii, yielded no poor children and therefore would 
have provided direct estimates of poor children, 
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the logarithms of these estimates are undefined and 
those counties were dropped from the model-fitting 
analysis. Despite the very effective small area pre- 
dictions generated by fitting unknown parameters f3 
to the set of sampled counties with well-defined yi , it 
remains questionable whether that fitted model (2) 
should be used to predict numbers of poor children 
in counties where no poor children were seen. This is 
an issue of model specification, which has been stud- 
ied for a number of years (Slud, 2003, 2004) and for 
which diagnostics have now been developed in Slud 
and Maiti (2010) by regarding the dropped counties 
as having been left-censored (or left-truncated) be- 
cause they are dropped when the count of sampled 
poor children is below a threshold. These diagnos- 
tics seem to show that the model (2) adequately 
describes the counties with well-defined yi, but that 
the same model cannot adequately predict in which 
counties there would be any poor children in a sam- 
ple. The upshot is that no model is yet known which 
can account for counts of sampled poor children in 
all counties. 

3. SPECIFICATION OF MULTILEVEL SURVEY 
ANALYSES 

The kind of model-checking described in the pre- 
vious paragraph is important because, while it is 
common for survey data sets (including aggregated 
area-level data sets used in small area modeling) to 
be highly cross-classified by covariates as well as unit 
response versus nonresponse, there is no guarantee 
that a single model can account well for all por- 
tions of the cross-classified population. Such survey 
data naturally suggest multilevel models, but mod- 
els which differ in form on different subsets of the 
population would lead to complicated interaction 
terms and random effects. 

Rao's paper treats multilevel modeling in a fre- 
quentist design-based setting in Section 3.3, under 
the general heading of estimation in complex sur- 
veys; yet when discussing unified models in a small 
area context, he accepts the value of hierarchical- 
Bayes models. Why is that? In general complex sur- 
veys, it seems likely that simultaneous hierarchical- 
Bayes (HB) models could be formulated for unit 
nonresponse, frame coverage errors, and survey re- 
sponses. If reasonable rules could be developed for 
defining prior parameters, then a Bayesian analy- 
sis is not on its face less theoretically acceptable 
than a complicated weight- adjustment procedure. 



But perhaps one serious objection is that each re- 
sponse variable would require its own Bayesian mo- 
del. Is the greater value of HB models for small area 
prediction due to the acceptability in that context of 
a separate model for each survey response variable? 

In the small area context, my own view is that 
hierarchical-Bayes models with objective priors — or 
priors chosen by the matching strategies discussed 
in Section 4 — might very well serve the smoothing 
function of shrinking direct estimators from simi- 
lar areas toward one another. But I feel much less 
comfortable with this class of models being used to 
extrapolate small area predictions to areas with very 
small or zero sample sizes. 

A difficulty with multilevel models, for both fre- 
quentists and Bayesians, is that different hierarchi- 
cal error structures can sometimes be almost im- 
possible to distinguish with useful power for mod- 
erately large sample sizes, as may be revealed by 
information-matrix calculations. Nevertheless, there 
are data sets where (generalized) likelihood ratio 
testing for the presence of certain error structure 
components can be rather decisive. In a spatial small 
area problem, Opsomer et al. (2008) modeled the 
alkalinity of lakes in a survey of lakes in terms of el- 
evation and radial P-spline basis functions in spatial 
coordinates, with the spline-term coefficients as ran- 
dom effects. In addition, independent random effects 
for slightly aggregated geographic units were consid- 
ered and found to be important after likelihood ratio 
testing. It will not always be possible to reach such 
firm conclusions, and this kind of model-comparison 
may be hard to reproduce in a Bayesian framework. 

4. MISCELLANEOUS COMMENTS 

All of us, frequentists and Bayesians, are tied to 
models in the sense that statistical theory generally 
has very little to say about the validity of likelihood- 
based inferences when the parametric model family 
does not contain the model actually governing the 
data. 

For sample survey data, frequentists have always 
found it difficult to say what is an appropriate like- 
lihood. [However, Rao's paper mentions in Section 5 
fascinating work in Wu and Rao (2006), Rao and Wu 
(2010), attempting to interpret empirical-likelihood 
survey methods as a Bayesian nonparametric survey 
likelihood.] A design-based view of finite-population 
sampling forces us to view the ensemble of survey at- 
tributes as nuisance parameters, about which we are 
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entitled to assume only a sort of large-super popula- 
tion stability. A frequentist approach to the high 
nuisance-parameter dimension is to base inferences 
on estimating equations, which is how Rao presents 
in Section 3.3 the "model-assisted" pseudo-likelihood 
method of estimating frame-population descriptive 
parameters, such as regression coefficients via GREG, 
and such as the multilevel variance-component pa- 
rameters that are the target of multilevel survey 
estimation. As far as I can tell, this approach has 
no Bayesian counterpart, so the survey analyst who 
wants the protection of correct estimation for vir- 
tually any superpopulation configuration of survey 
attributes has little recourse but to follow design- 
based theory. That seems to be the essence of the 
argument in favor of design-based survey methods 
when models are not absolutely necessary because 
of missing data. 

Weight adjustment for calibration and model-ba- 
sed nonresponse adjustment can also be viewed as 
estimating equation methods. Like other such meth- 
ods, weight-adjustments rely for their validity on 
correctness of at least some model assumptions: as 
Rao mentions, the most we can hope for in this en- 
terprise is a kind of "double robustness" in which 
design-consistency for the weighted survey estima- 
tor obtains when either the model used for nonre- 
sponse adjustment or a population- wide regression- 
type model is correct. See Kang and Shafer (2007) 
for related exposition of the double-robustness con- 
cept, and Slud and Thibaudeau (2010) for analogous 
results on a further development of the optimization- 
based weight-adjustment method of Deville and 
Sarndal (1992) to cover simultaneous weight- adjust- 
ment for nonresponse, calibration and weight-com- 
pression. 

Survey estimation is often an exercise in predic- 
tion, and it is known in many statistical problems 
that excellent predictions can be provided through 
estimating models which are too simple to pass good- 
ness-of-fit checks. This observation has not yet been 
formulated with mathematical care — no one knows 
how to characterize which target parameters and 
which combinations of true and oversimplified mod- 
els could work in this way — but frequentists and 
Bayesians would all benefit from a rigorous result 
of this type. 
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